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Abstract To allow scientists further capabilities in the area of data mining and web services, the Goddard Earth Sciences 
Data and Information Services Center (GES DISC) and researchers at the University of Alabama in Huntsville (UAH) have 
developed a system to mine data at the source without the need of network transfers. The system has been constructed by 
linking together several pre-existing technologies: the Simple Scalable Script-based Science Processor for Measurements 
(S4PM), a processing engine at the GES DISC; the Algorithm Development and Mining (ADaM) system, a data mining 
toolkit from UAH that can be configured in a variety of ways to create customized mining processes; ActiveBPEL, a 
workflow execution engine based on BPEL (Business Process Execution Language); XBaya, a graphical workflow 
composer; and the EOS Clearinghouse (ECHO). 


XBaya is used to construct an analysis workflow at UAH using ADaM components, which are also installed remotely at the GES 
DISC, wrapped as Web Services, flic S4PM processing engine searches ECHO for data using space-time criteria, staging them to 
cache, allowing the ActiveBPEL engine to remotely orchestrates the processing workflow within S4PM. As mining is completed, 
the output is placed in an FTP holding area for the end user. The goals are to give users control over the data they want to process, 
while mining data at the data source using the server’s resources rather than transferring the full volume over the internet. These 
diverse technologies have been infused into a functioning, distributed system w'ith only minor changes to the underlying 
technologies. The key to this infusion is the loosely coupled, Web- Services based architecture: All of the participating 
components are accessible (one way or another) through (Simple Object Access Protocol) SOAP-based Web Services. 


GES DISC 
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Mining Web Services Architecture ADaM // 


Command-line data mining algorithm 
from UAH wrapped as Web Services 


Web Services enables the infusion of diverse technologies 

XBaya / mining workflow composer 

• User authors workflow and deploys lo ActiveBPEL engine. 

ActiveBPlil..- workflow orchestration engine 

• Uxposes a URI. pointing to the WSD1. for that workflow 

• Workflow URL is sent to the GES DISC Data Mining Services via SOAP 
EOS Clearinghouse (ECHO) 

• Discovery service for data files 

Simple, Scalable, Script-Based Science Processor lor Measurements (S4PM) : 

• Processing engine searches ECHO for data, and caches locally 

• Invokes ActiveBPEL to execute ADaM via web service according to workflow 

• Stages output to an ETP area for pickup by external user. 


XBaya // 


Web Service workflow authoring tools from University 
of Indiana with modifications from UAII 






1 Data mining toolkit developed by UAH 

• Includes image processing, pattern recognition and other complex algorithms 
1 Includes over 100 scientific utilities 

1 Customizable as well as traditional data mining capabilities 
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Pattern Recognition 

• Classification Techniques 

• Bayes Classifier 

• Naive Bayes Classifier 

• Bayes Network Classifier 

• Classifier 

• And more... 


Image Processing 

• Arithmetic Opcrations(+-*/) 

• Collaging 

• Cropping 

• Image Difference 

• linage Normalization 

• And more... 
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S4PM // 


Invoke Workflow 


I’erl data processing engine from 01 -IS 
DISC' n iggers Web Service workflow 
via ActiveBPlil. engine 


Transmit Output 

University of Alabama in Huntsville 


ActiveBPEL //Remotely 


Data Center (GES DISC) 


hosted Web Service orchestral ion 
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• Java based client-side GUI 

• Compose / monitor workflows 
for Web Services 

• Hides complexities of Business 
Process Execution language 
(BPEL) 

• Decouples workflow execution 
from composition 

• Deploys WSDL workflows to 
different BPEL workflow engines 
(e.g. ActiveBPEL) 

» Save workflow to invoke later 


• Open source Java based 
implementation of the BPEL 
engine 

• Reads WSDL file from XBaya 

• Orchestrates processes from 
initial stage to execution 

• Manages flow control, alarms 
and other executions 



Conclusion 


• f lexible Perl-hased processing engine for Mining Web Services 

• Used heavily in all GES DISC processing applications 

• Robust and reliable tool for process automation 

• Capable of accessing large online data collection via ECHO search 

• Customizable to meet the needs of most data mining applications 
■ Open source 


Earth Science Mining Web Services, created from an infusion of well-known technologies, have shown promising results to the data mining/scientific community. 
With an abundance of algorithms available, users can create and execute their data mining workflows without any data transfer. In turn this gives user control over the 
data they want to process at the server’s source. The next phase will be the Smart Assistant for Earth Science Data Mining (SAM). SAM will provide data type/mining 
ontologies to aid in workflow composition, expansion of existing workflow composer tool and deployment of existing mining services in additional environments. 




